The project was done at Texas Tech University during the Multivariate Analysis class. The dataset is used from Gapminder World, which contains data from several sources. The motivation is focused on comparing countries from the economical, social and government perspectives.
The aim of this post is to present the results of the analysis, while the complete Report can be found here github. The file contains information about variables and their source as well describe the process of cleaning data. Furthermore, the visualizations presented here are interpreted in more detail in the Report.
The whole code for loading, merging and cleaning data is preseted here
library(readr)
library(readxl)
library(dplyr)
library(countrycode)
library(car)
source('Read_Clean.R')
cleaned <- Read_Clean()
Code for the graph: click
Continents presented in 3D space
In order to provide a general insignis into the data, all countries were presented in 3-dimensional space. At the first glance clusters between continents can be seen. Countries which are in the same continent in general present a similar profile. The code can be found here github
Code for PCA: click
Interpretation of PC1, PC2, and PC3 is as follows:
PC1: highly loaded in variables such as number of phones, life expectancy, Corruption index, Acces to the Internet and Income.
PC2: highly loaded in the number of suicides and sex ratio.
PC3: especially meaningful in the context of inequality.
Summary Result for the first three Principal Components
Plot PC1 vs PC2
PrinCompPlot[1]
## [[1]]
-On the right of the plot with a high value of PC1 high developed countries in Europe and North America can be spotted. Those contents are above the average in the context of Less corruption, Life expectancy, Internet access, number of phones and income.
Plot PC1 vs PC3
PrinCompPlot[2]
## [[1]]